Automatic multi-modal dialogue scene indexing
نویسنده
چکیده
An automatic algorithm for indexing dialogue scenes in multimedia content is proposed. The content is segmented into dialogue scenes using the state transitions of a hidden Markov model (HMM). Each shot is classified using both audio and visual information to determine the state/scene transitions for this model. Face detection and silence/speech/music classification are the basic tools which are utilized to index the scenes. While face information is extracted after applying some heuristics to skin-colored regions, audio analysis is achieved by examining signal energy, periodicity and zero crossing rate (ZCR) of the audio waveform. The simulation results show the possibility of automatically indexing the dialogues using the proposed algorithm.
منابع مشابه
Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing
A class of audio-visual content is segmented into dialogue scenes using the state transitions of a novel hidden Markov model (HMM). Each shot is classi ed using both audio track and visual content to determine the state/scene transitions of the model. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. More...
متن کاملMulti-modal Multi-label Semantic Indexing of Images Based on Hybrid Ensemble Learning
Automatic image annotation (AIA) refers to the association of words to whole images which is considered as a promising and effective approach to bridge the semantic gap between low-level visual features and high-level semantic concepts. In this paper, we formulate the task of image annotation as a multi-label multi class semantic image classification problem and propose a simple yet effective m...
متن کاملMulti-modal Video Summarization Using Hidden Markov Models for Content-based Multimedia Indexing
MULTI-MODAL VIDEO SUMMARIZATION USING HIDDEN MARKOV MODELS FOR CONTENT-BASED MULTIMEDIA INDEXING Yaşaroğlu, Yağız MSc., Department of Electrical and Electronics Engineering Supervisor: Associate Professor A. Aydın Alatan September 2003, 75 pages This thesis deals with scene level summarization of story-based videos. Two different approaches for story-based video summarization are investigated. ...
متن کاملMulti-modal recording, analysis and indexing of poster sessions
A new project on multi-modal analysis of poster sessions is introduced. We have designed an environment dedicated to recording of poster conversations using multiple sensors, and collected a number of sessions, to which a variety of multi-modal information is annotated, including utterance units for individual speakers, backchannels, nodding, gazing, and pointing. Automatic speaker diarization,...
متن کاملMultimodal corpora for human-machine interaction research
In recent years human-machine interaction has increased its importance. One approach to an ideal human-machine interaction is develop a multi-modal system behaves like human-beings. This paper introduces an overview on multimodal corpora which are currently developed in Japan for the purpose. The paper describes database of 1)Multi-modal interaction, 2)Audio-visual speech, 3)Spoken dialogue wit...
متن کامل